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(57) Abstract 

A dynamically configurable video signal processing system processes data in the form of hierarchical layers. The system partitions 
data between hierarchical layers and allows variation in the number of layers employed. Data is automatically partitioned into one or more 
hierarchical layers as a function of one or more parameters selected from available system bandwidth, input data rate, and output signal 
quality. In addition, the image resolution and corresponding number of pixels per image of the data may be varied as a function of system 
parameters. Both encoder (100) and decoder (105; 107; 109) systems are disclosed. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AM 


Armenia 


GB 


United Kingdom 


MW 


Malawi 


AT 


Austria 


GE 


Georgia 


MX 


Mexico 


AU 


Australia 


GN 


Guinea 


NE 


Niger 


BB 


Barbados 


GR 


Greece 


NL 


Netherlands 


BE 


Belgium 


HU 


Hungary 


NO 


Norway 


BF 


Burkina Faso 


IE 


Ireland 


NZ 


New Zealand 


BG 


Bulgaria 


IT 


Italy 


PL 


Poland 


BJ 


Benin 


JP 


Japan 


PT 


Portugal 


BR 


Brazil 


KE 


Kenya 


RO 


Romania 


BY 


Belarus 


KG 


Kyrgystan 


RU 


Russian Federation 


CA 


Canada 


KP 


Democratic People's Republic 


SD 


Sudan 


CF 


Central African Republic 




of Korea 


SE 


Sweden 


CG 


Congo 


KR 


Republic of Korea 


SG 


Singapore 


CH 


Switzerland 


KZ 


Kazakhstan 


SI 


Slovenia 


CI 


Cote d'lvoire 


LI 


Liechtenstein 


SK 


Slovakia 


CM 


Cameroon 


LK 


Sri Lanka 


SN 


Senegal 


CN 


China 


LR 


Liberia 


sz 


Swaziland 


cs 


Czechoslovakia 


LT 


Lithuania 


TD 


Chad 


CZ 


Czech Republic 


LU 


Luxembourg 


TG 


Togo 


DE 


Germany 


LV 


Latvia 


TJ 


Tajikistan 


DK 


Denmark 


MC 


Monaco 


TT 


Trinidad and Tobago 


EE 


Estonia 


MD 


Republic of Moldova 


UA 


Ukraine 


ES 


Spain 


MG 


Madagascar 


UG 


Uganda 


FI 


Finland 


ML 


Mali 


US 


United States of America 


FR 


France 


MN 


Mongolia 


UZ 


Uzbekistan 


GA 


Gabon 


MR 


Mauritania 


VN 


Viet Nam 



WO 97/01934 



1 



PCT/IB96/00595 



System for Encoding and Decoding Layered Compressed 

Video Data 

5 

This invention is related to the field of digital image 
signal processing, and more particularly to a system for processing 
hierarchical video data. 

10 An objective in the development of digital video 

encoding and decoding formats has been to provide a standard 
that accommodates different video transmission and reception 
systems. A further objective has been to promote interoperability 
and backward compatibility between different generations and 

1 5 types of video encoding and decoding equipment. In order to 
promote such interoperability and compatibility, it is desirable to 
define encoding and decoding strategies which can accommodate 
different types of video image scan (e.g. interlaced/progressive), 
frame rate, picture resolution, frame size, chrominance coding, and 

2 0 transmission bandwidth. 

One strategy used to achieve interoperability 
involves separating video data into one or more levels of a data 
hierarchy (layers) organized as an ordered set of bitstreams for 

2 5 encoding and transmission. The bitstreams range from a base 

layer, i.e. a datastream representing the simplest (e.g. lowest 
resolution) video representation, through successive enhancement 
layers representing incremental video picture refinements. The 
video data is reconstructed from the ordered bitstreams by a 

3 0 decoder in a receiver. This strategy permits decoder complexity to 

be tailored to achieve the desired video picture quality. A decoder 
may range from the most sophisticated configuration that decodes 
the full complement of bitstreams, that is all the enhancement 
layers, to the simplest that decodes only the base layer. 

35 
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A widely adopted standard that uses such a data 
hierarchy is the MPEG (Moving Pictures Expert Group) image 
encoding standard (ISO/IEC 13818-2, 10th May 1994), hereinafter 
referred to as the "MPEG standard". The MPEG standard details 
5 how the base and enhancement layer data may be derived, and 
how the video data may be reconstructed from the layers by a 
decoder. It is herein recognized that it is desireable to provide a 
system that incorporates encoder and decoder architectures for 
rationally partitioning data between the various layers and for 
1 0 dynamically configuring such a system for this purpose. 

In accordance with the principles of the present 
invention, dynamically configurable video signal processing 
systems enable data allocation among hierarchical layers to be 

1 5 varied. The dynamically configurable systems also permit the data 

to be partitioned between the hierarchical layers as desired, and 
allow variation in the number of layers employed. 

A disclosed digital signal processing system according 

2 0 to the present invention adaptively processes a datastream of 

image representative input data. A data processor automatically 
partitions input data into one or more hierarchical layers as a 
function of one or more parameters selected from available 
system bandwidth, input data rate, and output signal quality. 

25 

Also disclosed is a digital signal processing system for 
adaptively decoding a datastream of image representative input 
data partitioned into one or more hierarchical layers. The decoding 
system derives synchronization and configuration information 

3 0 from the input data and is adaptively configured to decode the 

number of hierarchical layers of the input data in response to a 
locally generated Control signal. 
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In accordance with a feature of the invention, the 
image resolution and corresponding number of pixels per image is 
varied as a function of system parameters. 

5 Brief Description of the Drawings 

In the drawing: 

Figure 1 shows an exemplary dynamically configurable 
1 0 video signal encoding and decoding architecture, according to the 
invention. 

Figure 2 depicts an exemplary graph of Peak Signal to 
Noise Ratio (PSNR) plotted against Bit Rate that indicates different 

1 5 coding strategy regions, according to the invention. 

Figure 3 presents a flowchart of a control function 
used for determining the Figure 1 architecture, according to the 
invention. 

20 

Figure 4 shows the encoding and decoding system of 
Figure 1 in the context of an MPEG compatible encoding and 
decoding system. 

2 5 Figure 5 depicts encoder and decoder architecture, 

according to the invention, for region A type encoding and 
decoding. 

Figure 6 shows encoder and decoder architecture, 

3 0 according to the invention, for region B type encoding and 

decoding. 

Figure 7 shows encoder and decoder architecture, 
according to the invention, for region C type encoding and 
3 5 decoding. 
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Figure 8 is a variation of Figure 1 with an additional 
architecture configuration for region A decoding, according to the 
invention. 

5 

Figure 9 is a variation of Figure 1 with an additional 
architecture configuration for region C decoding, according to the 
invention. 

10 Figure 10 presents a flowchart of a method for 

identifying the region type of the input data, according to the 
invention. 

The MPEG standard refers to the processing of 

1 5 hierarchical ordered bitstream layers in terms of "scalability". One 

form of MPEG scalability, termed "spatial scalability" permits data 
in different layers to have different frame sizes, frame rates and 
chrominance coding. Another form of MPEG scalability, termed 
"temporal scalability" permits the data in different layers to have 

2 0 different frame rates, but requires identical frame size and 

chrominance coding. In addition, "temporal scalability" permits an 
enhancement layer to contain data formed by motion dependent 
predictions, whereas "spatial scalability" does not. These types of 
scalability, and a further type termed "SNR scalability", (SNR is 

2 5 Signal to Noise Ratio) are further defined in section 3 of the MPEG 

standard. 

An embodiment of the invention employs MPEG 
"spatial" and "temporal" scalability in a 2 layer hierarchy (base 

3 0 layer and single enhancement layer). The enhancement layer data 

accommodates different frame sizes but a single frame rate and a 
single chrominance coding format. Two exemplary frame sizes 
correspond to HDTV (High Definition Television) and SDTV 
(Standard Definition Television) signal formats as proposed by the 
3 5 Grand Alliance HDTV specification in the United States, for 
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example. The HDTV frame size is 1080 lines with 1920 samples 
per line (giving 1080 x 1920 pixels per image), and the SDTV 
frame size is 720 lines with 1280 samples per line (giving 720 x 
1280 pixels per image). Both the HDTV and SDTV signals employ a 
5 30 Hz interlaced frame rate and the same chrominance coding 
format. 

Although the disclosed system is described in the 
context of such an MPEG compatible, two layer HDTV and SDTV 
1 0 spatially and temporally scalable application, it is exemplary only. 
The disclosed system may be readily extended by one skilled in 
the art to more than two layers of video data hierarchy and other 
video data resolutions (not only 720 and 1080 line resolution). 
Additionally, the principles of the invention may be applied to 

1 5 other forms of scalability, such as SNR scalability, and also may be 

used to determine a fixed optimum encoder and decoder 
architecture. The principles of the invention have particular 
application in TV coding (HDTV or SDTV), Very Low Bit Rate 
Coding (e.g. video conferencing) and digital terrestrial 

2 0 broadcasting for optimizing encoder and decoder apparatus for a 

desired communication bandwidth. 

Figure 1 shows a dynamically configurable video 
signal encoding and decoding architecture according to the 

2 5 invention. In overview, an input video datastream is compressed 

and allocated between a base (SDTV) data layer and an 
enhancement (HDTV) data layer by encoder 100. The allocation is 
performed in accordance with principles of the invention under 
the control of bandwidth and architecture control unit 120. The 

3 0 resulting compressed data from encoder 100 in the form of single 

or dual bitstreams is formed into data packets including 
identification headers by formatter 110. The formatted data 
output from unit 110, after transmission over a data channel, is 
received by transport processor 115. The transmission and 
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reception process is described later in connection with the 
encoding and decoding system depicted in Figure 4. 

Transport processor 115 (Figure 1) separates the 
5 formatted compressed bitstream data according to layer type, i.e. 
base or enhancement layer data, based on an analysis of header 
information. The data output from transport processor 115 is 
decompressed by decoder 105. The architecture of decoder 105 is 
determined in accordance with principles of the invention under 
1 0 the control of bandwidth and architecture control unit 145. A 
resulting decompressed data output from decoder 105, in the form 
of single or dual decompressed bitstreams, is suitable for encoding 
as an NTSC format signal and for subsequent display. 

1 5 Considering the dynamically configurable architecture of 

Figure 1 in detail, an input video datastream is compressed and 
allocated between a base SDTV data layer and an enhancement 
HDTV layer by encoder 100. Bandwidth and architecture control 
unit 120 configures the encoder 100 architecture to appropriately 

2 0 allocate data between the HDTV and SDTV output layers from 

units 125 and 135 respectively. The appropriate data allocation 
depends on a number of system factors including bandwidth, 
system output data rate constraints, the data rate and picture 
resolution (number of pixels per image) of the input video data, 

2 5 and the picture quality and resolution (number of pixels per 

image) required at each layer. In the described system, the image 
resolution between input and output of both encoder 100 and 
decoder 105 is varied by changing the number of pixels per image 
as described in greater detail later. 

30 

The data allocation and encoding strategy is derived by 
determining the minimum number of bits per unit time required 
to represent the video input sequence at the output of encoder 
100 for a specified distortion. This is the Rate Distortion Function 

3 5 for encoder 100. The Rate Distortion Function is evaluated, 
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assuming the input sequence is a Gaussian distribution source 
signal of mean p. and standard deviation o». Further, applying a 
squared-error criterion to the Rate Distortion Function, R, of such a 
Gaussian input sequence, in accordance with the theory presented 
5 in section 13.3.2 of "Elements of Information Theory" by T. M. 
Cover and J. A. Thomas, published by J. Wiley & Sons, 1991, gives, 

1 1 a 2 

R = j max (0, j lo 82^"5"^ < bits P er secon d) 



10 = \ l °82 ( ^) if 0<D<a2 

or, 

= 0 ifD>a 2 . 

Therefore, the Distortion Rate Function, D, is given by, 

15 D = c 2 2-2R 

which, when represented as a Peak Signal to Noise Ratio (PSNR), is 

255 2 

D PSNR = 10 log("^-) + 20 log (2 * R) 



2 0 Figure 2 is a graphical representation of Distortion Peak 

Signal to Noise Ratio Dp§ NR in decibels (dB), plotted against the 

Bit Rate of an Enhancement layer (bits per second) for a two layer 
spatial encoded system. Curves are plotted for a base layer 
distortion function, an enhancement layer distortion function, and 

25 a distortion function for an exemplary upsampled base layer for a 
1080 line interpolation of a 720 line picture. The base layer and 
upsampled base layer curves have a negative slope because as the 
bit rate of the Enhancement layer increases, the base layer bit rate 
decreases. The composite distortion curve for the 2 layer system is 

3 0 shown by the thick black line of Figure 2. This composite 
Distortion curve is a linearized approximation to the minimum 
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Distortion obtainable for the 2 layer system employing an 
upsampled base layer. 

An encoding and decoding strategy is derived from the 
5 two layer system results depicted in Figure 2. In particular, three 
regions A, B and C are identified in which advantage can be gained 
by adopting different encoding and decoding approaches. The 
boundaries of these regions may vary depending on the system 
bandwidth, system output data rate constraints, the data rate and 
1 0 picture resolution of the input video data and the picture quality 
and resolution required at each layer. The regions are identified as 
follows. 

Region A . 

15 In region A there is insufficient allocable bandwidth to 

achieve the required picture quality using either two layer 
encoding or a single high resolution layer encoding. In this region 
the video quality of a decoded upsampled base layer equals or 
exceeds the quality of a decoded picture derived from combined 

2 0 base layer and enhancement layer data. This region is bounded at 

its upper end at a point X on the enhancement layer curve that 

gives a picture quality (Dpg^R value ) equivalent to that of the 

upsampled base layer curve at the zero Bit Rate Enhancement 
layer point Y. 

25 

In region A there is an advantage in allocating the full 
available system bandwidth to the encoding and compression of a 
single layer (the base layer) at a reduced spatial resolution with a 
reduced number of pixels per image. This strategy may be 

3 0 implemented in various ways. One way, for example, is to 

downsample an input datastream to provide a single base layer 
(SDTV) for transmission, and then to decode the corresponding 
received base layer to provide an SDTV decoded output upon 
reception. A higher resolution HDTV decoded output may be 
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produced at a receiver in addition to the SDTV decoded output by 
upsampling (oversampling) the decoded SDTV output. The 
advantage of this strategy arises because scarce bandwidth is 
more efficiently used when it is allocated to encode a lower 
5 resolution single layer bitstream than when it is used to encode 
either two layers or a single high resolution layer. This is because 
these latter approaches typically incur greater encoding overhead 
associated with required additional error protection and data 
management code, for example. The region A type of situation 

1 0 may occur, for example, when the total available system 

bandwidth is insufficient to support full resolution encoding. The 
advantage of the region A encoding approach may also arise in 
other situations, for example, when an input datastream to be 
encoded contains significant non-translational motion. Then, 
1 5 region A spatial down and up sampling may provide better 
picture quality in a bandwidth constrained system than can be 
provided by motion compensated prediction encoding. This is 
because of the overhead associated with such motion 
compensation. The region A operation is discussed in greater 

2 0 detail in connection with Figure 5. 

Region B . 

In region B, there is sufficient system bandwidth to 
meet the required output picture quality using a two layer 

2 5 encoding strategy. In this region, the available system bandwidth 

is allocated between layers so that the quality requirements of 
both the decoded high and low resolution outputs are met. This 
region lies between region A and region C. 

3 0 In region B, the system bandwidth is allocated in 

accordance with picture quality requirements between high 
resolution and low resolution signal output layers. The two output 
layers may be encoded for transmission in various ways. One way, 
for example, is to downsample and encode the high resolution 
3 5 input datastream to provide a low resolution (SDTV) layer for 
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transmission, and to decode this low resolution layer when 
received to provide a low resolution SDTV signal. The high 
resolution (HDTV) enhancement layer to be transmitted may be 
derived from a combination of an upsampled version of the 
5 encoded SDTV layer and previous frames of the encoded HDTV 
layer. The HDTV decoded output may be derived from a 
combination of an upsampled version of the decoded SDTV output 
and the received encoded HDTV layer. This operation is discussed 
in greater detail in connection with Figure 6. 

10 

Region C . 

In region C, the required picture quality cannot be 
achieved by allocating the system bandwidth either to encode two 
layers or to encode a single (low resolution) layer. In this region, a 
1 5 high quality output video signal may be achieved, given the 
system bandwidth constraint, by encoding a single high resolution 
layer. This region is bounded by a point V on the enhancement 
layer curve that provides the level of picture quality required as a 

minimum for the base layer alone (equal to Dp^-^j^ value W of 

2 0 Figure 2). 

In region C there is an advantage in allocating the full 
system bandwidth to the encoding and compression of a single 
layer (the enhancement layer) at full spatial resolution with a full 

2 5 number of pixels per image. This strategy may be implemented in 

various ways. One way, for example, is to encode the input 
datastream at full spatial resolution as a single high resolution 
enhancement (HDTV) layer for transmission, and to decode the 
corresponding received enhancement layer to provide the high 

3 0 resolution HDTV output. At a receiver, a low resolution (SDTV) 

output may be derived from the received high resolution signal by 
downsampling in the compressed or decompressed domain as 
described later. The advantage of this region C strategy arises 
because, given the required output picture quality level, the 
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available bandwidth is more efficiently used when it is allocated 
to encode a single high resolution layer rather than when it is 
used to encode two layers for transmission. This is because two 
layer encoding requires additional error protection and data 
5 management overhead information. This region C operation is 
discussed in greater detail in connection with Figure 7. 

The three regions (A, B and C) identified for the 2 layer 
1 0 system of Figure 2 may not all be present in every 2 layer system. 
For example, only one or two regions may be identified depending 
on the system bandwidth, system data rate constraints, and the 
picture quality and resolution required at each layer. Conversely, 
in systems involving more than two layers, more than three 

1 5 regions may be identified in accordance with the principles of the 

invention. However, irrespective of the number of data regions 
identifiable in a system, adequate decoded picture quality may be 
achieved using encoding and decoding architectures configurable 
for only a limited number of the identifiable regions. 

20 

The different encoding and decoding strategies 
associated with regions A, B and C are implemented in the 
dynamically configurable architecture of Figure 1. In encoder 100, 
the appropriate strategy and architecture for allocating data 

2 5 between the HDTV and SDTV output layers is determined by 

control unit 120. Control unit 120, e.g. including a microprocessor, 
configures the architecture of encoder 100 using the process 
shown in the flowchart of Figure 3. Control unit 120 first identifies 
the region type of the input data in step 315 of Figure 3 following 

3 0 the start at step 310. The region type is determined in accordance 

with the previously discussed principles based on factors 
including the available system bandwidth, the data rate of the 
input datastream and the picture quality required of each 
decompressed output layer. These factors may be pre- 
3 5 programmed and indicated by data held in memory within control 
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unit 120 or the factors may be determined from inputs to control 
unit 120. For example, the data rate may be sensed directly from 
the input datastream. Also, externally sourced inputs may 
originate from operator selection, for instance, and be input to 
5 control unit 120 via a computer interface, for example. In one 
implementation, for example, control unit 120 may derive input 
data rate threshold values establishing the boundaries between 
regions A, B and C based on the preprogrammed values indicating 
system bandwidth and required picture quality of each 

1 0 decompressed output layer. Then, control unit 120 adopts the 

appropriate region A, B or C encoding strategy based on the data 
rate of the input datastream reaching particular thresholds. 
Alternatively, the input data rate threshold values may 
themselves be preprogrammed within unit 120. 

15 

The region type of the input data is identified in step 
315 of Figure 3 using the method shown in the flowchart of Figure 
10. In step 515 of Figure 10, following the start at step 510, a 
single hierarchical layer and 1080 line image resolution is initially 

2 0 selected for encoding the data in the coding region. The predicted 

Distortion factor for the input data when it is encoded as a single 
layer for transmission with 1080 line resolution is computed in 
step 525. Step 530 directs that steps 515 and 525 are repeated to 
compute the Distortion factors for a single layer encoding 

2 5 implementation with 720 line resolution. Also, step 530 directs 

that steps 515 and 525 are further repeated to compute the 
Distortion factors for a two layer encoding implementation with 
both 720 and 1080 line resolutions. The resultant Distortion 
factors are compared and the image resolution and number of 

3 0 hierarchical layers used for encoding are determined in step 540. 

The selection process ends at step 550. The number of layers and 
image resolution are selected in step 540 to give the minimum 
Distortion factor. This layer and resolution selection process 
implements the coding region identification function of step 315 
3 5 (Figure 3). It should be noted that this method of partitioning 
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encoded input data is also useable for a variety of applications in 
which data is to be prepared for transmission and is not restricted 
to image processing. For example, the process may be used for 
telephony, satellite or terrestrial communication including 
5 microwave and fibre-optic communication. Further, this process 
can encompass other types of data and the partitioning of data 
into other types of data segments or data packets, not just 
hierarchical layers of encoded data. The process may also 
encompass different numbers of data segments and data 
1 0 resolution not just the two layers and the two data resolutions 
described with respect to the preferred embodiment. 

If region A is selected, step 320 (Figure 3) directs that 
step 325 is performed and encoder 100 is configured for a type A 

1 5 architecture. In addition, formatter 110 encodes the transmitted 
bitstream to indicate the region type of the data and the 
appropriate decoding architecture using information provided by 
control unit 120. Decoder 105 is compatibly configured to decode 
the transmitted region A type data in response to the encoded 

2 0 architecture information. If the data is region C type, step 330 
directs that step 335 is performed. Step 335 provides that encoder 
100 is configured for a region C architecture, and the transmitted 
bitstream is updated to indicate the data and decoding 
architecture type in the manner described for region A. If the data 

2 5 is not region C type, step 330 directs that step 340 is performed. 

Step 340 provides that encoder 100 is configured for a region type 
B architecture and the transmitted bitstream is updated to 
indicate the data and decoding architecture type in the manner 
described for region A. 

30 

Control unit 120 configures encoder 100 via a 
Configuration signal CI that is provided to each of the constituent 
elements of encoder 100. Control unit 120 updates the 
configuration of encoder 100 for individual input data packets 

3 5 where each data packet consists of sequences of code words and 
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represents a group of pictures e.g. a Group of Pictures in 
accordance with the MPEG standard. However, control unit 120 
may update the encoder 100 configuration for different data 
packet lengths as appropriate for a particular system. For 
5 example, the configuration may be performed at power-on, for 
each picture, for each picture stream (e.g. program), for each pixel 
block (e.g. macroblock), or at variable time intervals. 

In region A operating mode, control unit 120 disables, 
10 via the Configuration signal, both HDTV compressor 125 and 2:3 
upsampler 130. In the resulting configuration of encoder 100 a 
single SDTV output layer is provided to formatter 110 by unit 135 
of unit 100 for transmission. This configuration is shown and 
discussed in connection with Figure 5. Continuing with Figure 1, to 

1 5 produce the SDTV layer output, 3:2 downsampler 140 reduces the 

spatial resolution of the 1080 line resolution input datastream by 
a factor of 2/3 to provide a 720 line output. This may be achieved 
by a variety of known methods including, for example, simply 
discarding every third line or preferably by performing an 

2 0 interpolation and averaging process to provide two interpolated 

lines for every three original lines. The 720 line output from 
downsampler 140 is compressed by SDTV compressor 135 to 
provide SDTV layer compressed data to formatter 110. The 
compression performed by unit 135 employs a temporal 

2 5 prediction process that uses prior SDTV layer frames stored within 

encoder 135. Such a compression process, involving temporal 
prediction and Discrete Cosine Transform (DCT) compression, is 
known and described, for example, in chapter 3 of the Grand 
Alliance HDTV System Specification of April 14, 1994, published 

3 0 by the National Association of Broadcasters (NAB) Office of Science 

and Technology in their 1994 Proceedings of the 48th annual 
conference. 

The resultant SDTV bitstream is formed into data 
3 5 packets including identification headers and architecture 
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information by formatter 110. The architecture information is 
provided by control unit 120 and is encoded by formatter 110 
into the transmitted bitstream using the "Hierarchy Descriptor" 
described in sections 2.6.6 and 2.6.7, of the MPEG image encoding 
5 systems standard (ISO/IEC 13818-1, 10th June 1994). The 
architecture information is subsequently used by decoder 105 to 
compatibly configure decoder 105 for the appropriate decoding 
mode (e.g. region A, B or C mode). The configuration of decoder 
105, like encoder 100, is updated for each transmitted data 
1 0 packet. A data packet contains a group of pictures in this 
preferred embodiment. 

Although using the MPEG "Hierarchy Descriptor" is the 
preferred method of ensuring encoder 100 and decoder 105 are 

1 5 compatibly configured, other methods are possible. The 
architecture information may, for example, be encoded in MPEG 
syntax in the "User Data" field defined in section 6.2.2.2.2, of the 
MPEG standard. Alternatively, decoder 105 may deduce the 
appropriate decoding mode from the bit rate of the encoded 

2 0 received data stream determined from the bit rate field of the 
sequence header per section 6.2.2.1 of the MPEG standard. The 
decoder may use this bit rate information together with pre- 
programmed data detailing the bandwidth and video quality 
requirements of the decoded output to deduce the appropriate 

2 5 decoding mode in accordance with the previously described 

principles of the invention. The decoding mode may be changed, 
for example, when the received bit rate reaches pre-programmed 
thresholds. 

3 0 The formatted compressed datastream output from unit 

110 is conveyed over a transmission channel before being input to 
transport processor 115. Figure 4 shows an overall system 
including the elements of Figure 1 as well as transmission and 
reception elements 410-435. These transmission and reception 
3 5 elements are known and described, for example, in the reference 
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text, Digital Communication, Lee and Messerschmidt (Kluwer 
Academic Press, Boston, MA, USA, 1988). Transmission encoder 
410 encodes the formatted output from unit 110 (Figures 1 and 4) 
for transmission. Encoder 410 typically sequentially scrambles, 
5 error encodes and interleaves the formatted data to condition the 
data for transmission prior to modulation by modulator 415. 
Modulator 415 then modulates a carrier frequency with the 
output of encoder 410 in a particular modulation format e.g. 
Quadrature Amplitude Modulation (QAM). The resultant 

1 0 modulated carrier output from modulator 415 is then frequency 
shifted and transmitted by up-converter and transmitter 420 
which may be, for example, a local area broadcast transmitter. It 
should be noted that, although described as a single channel 
transmission system, the bitstream information may equally well 

1 5 be transmitted in a multiple channel transmission system, e.g. 
where a channel is allocated to each bitstream layer. 

The transmitted signal is received and processed by 
antenna and input processor 425 at a receiver. Unit 425 typically 
2 0 includes a radio frequency (RF) tuner and intermediate frequency 
(IF) mixer and amplification stages for down-converting the 
received input signal to a lower frequency band suitable for 
further processing. The output from unit 425 is demodulated by 
unit 430, which tracks the carrier frequency and recovers the 

2 5 transmitted data as well as associated timing data (e.g. a clock 

frequency). Transmission decoder 435 performs the inverse of the 
operations performed by encoder 410. Decoder 435 sequentially 
deinterleaves, decodes and descrambles the demodulated data 
output from unit 430 using the timing data derived by unit 430. 

3 0 Additional information concerning these functions is found, for 

example, in the aforementioned Lee and Messerschmidt text. 

Transport processor 115 (Figures 1 and 4) extracts 
synchronization and error indication information from the 
3 5 compressed data output from unit 435. This information is used in 
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the subsequent decompression performed by decoder 105 of the 
compressed video data output from processor 115. Processor 115 
also extracts decoding architecture information from the MPEG 
Hierarchy Descriptor field within the compressed data from unit 
5 435. This architecture information is provided to decoder 
bandwidth and architecture control unit 145 (Figure 1). Unit 145 
uses this information to compatibly configure decoder 105 for the 
appropriate decoding mode (e.g. region A, B or C mode). Control 
unit 145 configures decoder 105 via a second Configuration signal 

1 0 C2 that is provided to each constituent element of decoder 105. 

In region A mode, control unit 145 of Figure 1 
disables, via the second Configuration signal, both HDTV 
decompressor 150 and adaptation unit 165. In the resulting 
1 5 configuration of decoder 105, the SDTV layer compressed video 
output from processor 115 is decompressed by SDTV 
decompressor 160 to provide a decompressed 720 line resolution 
SDTV output sequence. The decompression process is known and 
defined in the previously mentioned MPEG standard. In addition, 

2 0 upsampler 155 oversamples the 720 line resolution SDTV output 

by a factor of 3/2 to provide a 1080 line resolution HDTV 
decompressed output. This may be achieved by a variety of 
known methods including, for example, interpolation and 
averaging to provide three interpolated lines for every two 

2 5 original lines. The 1080 line resolution decompressed output from 

upsampler 160 is selected, via multiplexer 180 in response to the 
second Configuration signal, as the HDTV decompressed output 
sequence. The resulting decompressed HDTV and SDTV data 
outputs from decoder 105 are suitable for encoding as an NTSC 

3 0 format signal by unit 440 of Figure 4, for example, and for 

subsequent display. 

Figure 5 shows the encoder and decoder apparatus of 
Figure 1 configured for region A type encoding and decoding. The 
3 5 functions of the elements shown are as previously described. 
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Upsampler 130 and HDTV compressor 125, shown in encoder 100 
of Figure 1, are absent in Figure 5 since these elements are 
disabled in region A mode as previously described. Similarly, 
HDTV decompressor 150 and adaptation unit 165, shown in 
5 decoder 105 of Figure 1, are absent in Figure 5 since these 
elements are disabled in region A mode also as previously 
described. 

If the input data in Figure 1 is region B type, control 
1 0 unit 120 configures encoder 100 for a region B architecture. This 
is done using the Configuration signal in a manner similar to that 
previously described for region A. However, in region B, encoder 
100 compresses both high resolution and low resolution output 
layers for transmission, in contrast to the single low resolution 

1 5 output compressed for region A. This configuration is shown and 

discussed in connection with Figure 6. Continuing with Figure 1, 
control unit 120 allocates the system bandwidth between the high 
resolution and low resolution output layers by configuring encoder 
100 to compress enhancement data as a high resolution HDTV 

2 0 output layer in addition to a low resolution SDTV output. This 

HDTV layer provides picture refinement data to enable decoder 
105 to produce a 1080 line resolution picture output from the 720 
line resolution SDTV layer. 

2 5 The SDTV layer output in region B is produced in the 

same way as described for region A. The 720 line output from 
downsampler 140 is compressed by SDTV compressor 135 to 
provide SDTV layer compressed data to formatter 110. However, 
in region B, the high resolution HDTV enhancement layer for 

3 0 transmission is derived by HDTV compressor 125. Compressor 125 

derives the HDTV output by combining and compressing an 
upsampled decompressed version of the SDTV layer produced by 
upsampler/decompressor 130 and previous frames of the HDTV 
layer stored within compressor 125. Such a combination and 
3 5 compression process involving temporal prediction performed by 



WO 97/01934 



PCT/IB96/00595 



19 

compressor 125 is known and contemplated, for example, in the 
spatial scalability section (section 7.7) of the MPEG standard. The 
resulting HDTV and SDTV compressed outputs from encoder 100 
are provided to formatter 110. 

5 

The HDTV and SDTV bitstreams from encoder 100 are 
formed by formatter 110 into data packets including identification 
headers and architecture information in the "Hierarchy Descriptor" 
field. As described for region A, the formatted data from unit 110 
1 0 is conveyed to transport processor 115 which provides the 
architecture information to decompressor control unit 145 for 
configuring decoder 105 (here for region B). 

At the receiver, in region B mode, control unit 145 
15 disables adaptation unit 165 using the second Configuration signal. 
In the resulting configuration of decoder 105, the compressed 
SDTV output from processor 115 is decompressed by unit 160 to 
give a 720 line resolution SDTV output, as in region A. HDTV 
decompressor 150 derives a decompressed 1080 line resolution 
2 0 HDTV output by combining and decompressing an upsampled 
version of this decoded SDTV output produced by upsampler 155 
and previous frames of the HDTV layer stored within 
decompressor 150. The process of combining the upsampled and 
stored data and forming a decompressed output as performed by 

2 5 decompressor 150 is known and described, for example, in the 

spatial scalability section (section 7.7) of the MPEG standard. The 
1080 line high resolution decompressed output from 
decompressor 150 is selected as the HDTV decompressed output, 
via multiplexer 180, in response to the second Configuration 

3 0 signal. The resulting decompressed HDTV and SDTV data outputs 

from decoder 105 are suitable for further processing and 
subsequent display as previously described. 

Figure 6 shows the encoder and decoder apparatus of 
3 5 Figure 1 configured for region B type encoding and decoding. The 
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functions of the elements shown are as previously described. 
Adaptation unit 165, shown in decoder 105 of Figure 1, is absent 
in Figure 6 since this element is disabled in region B mode also as 
previously described. 

5 

If the input data in Figure 1 is region C type, control 
unit 120 configures encoder 100 for a region C architecture. This is 
done using the Configuration signal in a manner similar to that 
previously described for region A. However, in region C, encoder 
10 100 encodes a single high resolution output rather than a low 
resolution output as for region A or two outputs as for region B. 
Control unit 120 allocates the full system bandwidth, if necessary, 
to encode a high resolution output and configures unit 100, via the 
Configuration signal, to encode the enhancement layer at a full 

1 5 spatial (1080 line) HDTV resolution. 

In region C mode, control unit 120 disables 
downsampler 140, SDTV compressor 135 and upsampler 130, via 
the Configuration signal. In the resulting configuration of encoder 

2 0 100 the input sequence is compressed by HDTV compressor 125 

using the full system bandwidth as required to provide a 1080 
line resolution HDTV output to formatter 110. This configuration is 
shown and discussed in connection with Figure 7. Continuing with 
Figure 1, compressor 125 derives the HDTV output using previous 

2 5 frames of the HDTV layer stored within compressor 125. The 

compression process performed by compressor 125 in region C is 
like that described for regions A and B and is also known. 

The HDTV bitstream from unit 100 is formed by 

3 0 formatter 110 into data packets including identification headers 

and architecture information in the "Hierarchy Descriptor" field. 
As described for region A, the formatted data from unit 110 is 
conveyed to transport processor 115, which provides the 
architecture information to decoder control unit 145 for 
3 5 configuring decoder 105 (here for region C). 



WO 97/01934 



PCT7IB96/00595 



21 

At the receiver, in region C mode, control unit 145 
disables upsampler 155 using the second Configuration signal. In 
the resulting configuration of decoder 105, the compressed HDTV 
5 output from processor 115 is decompressed by unit 150 to give a 
1080 line high resolution HDTV output. This 1080 line 
decompressed output from decompressor 150 is selected as the 
HDTV decoded output of decoder 105, via multiplexer 180, in 
response to the second Configuration signal. In addition, the 

10 compressed HDTV output from processor 115 is adapted to meet 
the input requirements of SDTV decompressor 160 by adaptation 
unit 165. This is done by reducing the spatial resolution of the 
compressed HDTV output from processor 115 to an effective 720 
line resolution in the compressed (frequency) domain. This may 

1 5 be performed, for example, by discarding the higher frequency 
coefficients of those Discrete Cosine Transform (DCT) coefficients 
that represent the video information of the compressed HDTV 
output from processor 115. This process is known and described, 
for example, in "Manipulation and Compositing of MC-DCT 

2 0 Compressed Video" by S. Chang et al, published in the I.E.E.E. 
Journal of Selected Area in Communications (JSAC), January 1995. 
The spatially reduced compressed output from adaptation unit 
165 is decompressed by unit 160 to give a 720 line resolution 
SDTV output. The decompression processes performed by units 

2 5 160 and 150 are like those described for region A and similarly 

known. The resulting decoded HDTV and SDTV data outputs from 
decoder 105 are suitable for further processing and subsequent 
display as previously described. 

3 0 Figure 7 shows the encoder and decoder apparatus of 

Figure 1 configured for region C type encoding and decoding. The 
functions of the elements shown are as previously described. 
Downsampler 140, SDTV compressor 135 and upsampler 130, 
shown in encoder 100 of Figure 1, are absent in Figure 7 since 
3 5 these elements are disabled in region C mode as previously 
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described. Similarly, upsampler 155, shown in decoder 105 of 
Figure 1, is absent in Figure 7 since this element is disabled in 
region C mode. 

5 Figure 8 is a variation of Figure 1 and shows an 

additional architecture configuration for region A decoding. The 
functions performed by encoder 100, formatter 110 and transport 
processor 115 of Figure 8 are as described for Figure 1. In 
addition, the functions of decoder 109 of Figure 8 are the same as 

1 0 those of decoder 105 of Figure 1 except that in region A decoding, 

the 1080 line resolution HDTV decompressed output is provided in 
a different manner. 

In region A mode, decoder control unit 149 of Figure 8 
1 5 disables, via the second Configuration signal, both upsampler 155 
and adaptation unit 165. In the resulting configuration of decoder 
109 the SDTV layer compressed video output from processor 115 
is decompressed by SDTV decompressor 160 to provide the SDTV 
output of decoder 109. This is performed in the same manner as 

2 0 described for Figure 1. However, the HDTV decompressed output 

from decoder 109 is produced by upsampiing the SDTV layer in 
the frequency domain in contrast to the time domain sampling 
performed in decoder 105 of Figure L The compressed output 
from processor 115 in Figure 8 is upsampled in the compressed 

2 5 (frequency) domain by adaptation unit 168 (not present in Figure 

1). This may be performed, for example, by "zero padding" the 
higher order Discrete Cosine Transform (DCT) frequency 
coefficients that represent the video information in the 
compressed SDTV output from processor 115. In effect, selected 

3 0 higher order DCT coefficients are assigned zero values. The theory 

behind this process is known and described, for example, in the 
previously mentioned "Manipulation and Compositing of MC-DCT 
Compressed Video" by S. Chang et al, published in the I.E.E.E. 
Journal of Selected Area in Communications (JSAC), January 1995. 
3 5 The resultant upsampled output from adaptation unit 168 is 
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decompressed by HDTV decompressor 152 to provide the HDTV 
output from decoder 109. The resulting decompressed HDTV and 
SDTV data outputs from decoder 109 are suitable for processing 
and subsequent display as described in connection with Figure 1. 

5 

Figure 9 is a variation of Figure 1 and shows an 
additional architecture configuration for region C decoding. The 
functions performed by encoder 100, formatter 110 and transport 
processor 115 of Figure 9 are as described for Figure 1. In 

1 0 addition, the functions of decoder 107 of Figure 9 are the same as 

those of decoder 105 of Figure 1 except that in region C decoding, 
the 720 line resolution SDTV decompressed output is provided in 
a different manner. 

15 In region C mode, control unit 147 of Figure 9 disables, 

via the second Configuration signal, both upsampler 155 and SDTV 
decompressor 162. In the resulting configuration of decoder 107 
the HDTV layer compressed video output from processor 115 is 
decompressed by HDTV decompressor 150 to provide the HDTV 

2 0 output of decoder 107. This is performed in the same manner as 

described for Figure 1. However, the SDTV decompressed output 
from decoder 107 is produced by downsampling the HDTV layer 
in the time domain in contrast to the frequency domain sampling 
performed in decoder 105 of Figure 1. The decompressed HDTV 

2 5 output from multiplexer 180 in Figure 9 is downsampled by 

downsampler 170 (not present in Figure 1) by a factor of 2/3 to 
provide a 720 line output. This may be performed by a variety of 
known methods as discussed with respect to downsampler 140 of 
encoder 100 in Figure 1. The 720 line resolution decompressed 

3 0 output from downsampler 170 is selected as the SDTV decoded 

output of decoder 107, via multiplexer 175 (not present in Figure 
1), in response to the second Configuration signal. The resulting 
decompressed HDTV and SDTV data outputs from decoder 107 are 
suitable for processing and subsequent display as described in 
3 5 connection with Figure 1. 
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The encoder and decoder architectures discussed with 
respect to Figures 1-9 are not exclusive. Other architectures may 
be derived for the individual regions (A, B and C) that could 
5 accomplish the same goals. Further, the functions of the elements 
of the various architectures may be implemented in whole or in 
part within the programmed instructions of a microprocessor. 
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CLAIMS : 

1. A digital signal processing system for adaptively 
processing a datastream containing input image representative 

5 pixel data and for providing output data representing said input 
image data, said system comprising: 

a data processor (100) with an input for receiving said 
datastream, and for automatically partitioning said input image 
representative data into a variable number of hierarchical layers 

1 0 as a function of input data rate to provide partitioned data; and 

an output processor (110) responsive to said 
partitioned data from said data processor for formatting said 
partitioned data to be compatible with the 1 requirements of an 
output transmission channel to provide said output data. 

15 

2. A system according to claim 1, wherein 

said data processor varies the number of pixels per 
image of said partitioned data. 

2 0 3. A system according to claim 2, wherein 

the number of pixels representing an image in said 
output data, as conveyed to said transmission channel, is less than 
the number of pixels representing a corresponding image in said 
datastream. 

25 

4. A system according to claim 1, wherein 
said data processor partitions said input image 
representative data into a plurality of hierarchical layers of data 
respectively corresponding to different numbers of pixels per 

3 0 image. 
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5. A system according to claim 1, wherein 

said data processor comprises a plurality of data 

processing networks including compression networks and 

sampling networks; and 
5 the signal processing configuration of said data 

processing networks is automatically adapted as a function of 

parameters including at least one of available system bandwidth, 

input data rate, and signal quality. 

10 6. A system according to claim 1, further including 

a control network (120) for providing a configuration 
Control signal as a function of the data rate of said input image 
data; and wherein 

said data processor comprises a compression network 

15 (125-140) responsive to said Control signal for providing 
compressed image representative data as said partitioned data, 
wherein both the number of pixels per image of said compressed 
image representative data and said number of hierarchical layers 
are determined by the configuration of said compression network 

2 0 in response to said configuration Control signal. 

7. Apparatus according to claim 6, wherein 
said compression network provides, in one 
configuration, output compressed data as a single layer with a 
2 5 number of pixels per image substantially equal to the number of 
pixels per image of said image representative input data, and in 
another configuration, output compressed image data as a single 
layer with a number of pixels per image less than the number of 
pixels per image of said image representative input data. 

30 
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8. Apparatus according to claim 6, wherein 

output data produced by said output processor is in 
MPEG compatible format and contains information indicating the 
configuration of said compression network, wherein said 
5 indicating information is encoded in an MPEG format field selected 
from at least one of the Hierarchy Descriptor field and the User 
Data field. 

9. Apparatus according to claim 6, wherein 

1 0 said number of hierarchical layers is varied in 

response to said configuration Control signal on a periodic basis 
defined by at least one interval selected from (a) an interval 
corresponding to the duration of a group of pictures, (b) an 
interval corresponding to the duration of a program stream, and 

1 5 (c) an interval corresponding to the duration of a picture block. 

10. Apparatus according to claim 6, wherein 

said number of hierarchical layers is varied on a 
temporal basis. 

20 

11. Apparatus according to claim 6, wherein 

said control network provides said configuration 
Control signal as a function of available bandwidth. 

2 5 12. Apparatus according to claim 6, wherein 

said control network provides said configuration 
Control signal as a function of the image quality required of output 
data from said output processor. 
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13. A digital signal processing system for adaptively 
decoding a datastream including image representative input pixel 
data partitioned into one or more hierarchical layers, said system 
comprising: 

5 a processor (115) for deriving synchronization 

information from said datastream; 

a controller (145) for deriving configuration 
information from said datastream and providing a Control signal 
representing said configuration information; and 
10 a decoder (150-180) for decoding said input pixel data 

using said synchronization information, said decoder being 
adaptively configured to decode the number of said hierarchical 
layers of said input pixel data in response to said Control signal. 

15 14. A system according to claim 13, wherein 

said decoder exhibits one configuration for decoding a 
single hierarchical layer of image representative data containing a 
first number of pixels per image, and exhibits another 
configuration for decoding a single hierarchical layer of image 

2 0 representative data at a reduced second number of pixels per 
image. 

15. A system according to claim 13, wherein 

said decoder decodes a plurality of hierarchical layers 

2 5 corresponding to different numbers of pixels per image. 

16. A system according to claim 13, wherein 

the number of pixels of an image represented by 
output data from said decoder is different from the number of 

3 0 pixels of an image represented by said input pixel data. 



WO 97/01934 



PCT/IB96/00595 



29 

17. A system according to claim 13, wherein 
said input pixel data is compressed data; and 

said decoder decompresses said number of said 
hierarchical layers of said input pixel data and the number of 
5 pixels per image of said input pixel data in response to said 
Control signal. 

18. Apparatus according to claim 17, wherein 
said input pixel data is MPEG compatible data; and 

1 0 said control network derives configuration information 

from a field of said MPEG compatible data, said field being 
selected from one of a User Data field, a Hierarchy Descriptor field 
and a Bit Rate field. 

15 19. Apparatus according to claim 17, wherein 

said configuration information represents the data rate 
of said input pixel data. 

20. Apparatus according to claim 17, wherein 

2 0 said decoder (a) in one configuration provides one 

layer of decompressed image data with a number of pixels per 
image substantially equal to the number of pixels per image of 
said input pixel data, and (b) in another configuration provides 
one layer of decompressed image data with a number of pixels per 
2 5 image less than the number of pixels per image of said input pixel 
data. 
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21. Apparatus according to claim 17, wherein 

said input pixel data is subject to exhibiting variable 
image resolution as a function of the number of pixels per image; 
and 

5 said decoder decompresses said input pixel data to 

provide decompressed image data with a predetermined number 
of pixels per image, wherein said input pixel data is subject to 
exhibiting at least two different numbers of pixels per image 
corresponding to different image resolutions. 

10 

22. Apparatus according to claim 21, wherein 

said decoder is adaptively configured on a temporal 

basis. 
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